Computing devices are often used for guidance in following step-by-step instructions. For example, people routinely bring smartphones, tablets, or laptops into their kitchens, workshops, or workspaces so that they can follow step-by-step instructions that are loaded from the web via a web browser. To allow a user's hands to be free to perform the task, some such instructions have been provided using audible instructions from computing devices, such as an intelligent speaker device, which is a computing device that provides an audio-based user interface through which interaction with a digital assistant can occur. The audio-based user interface of the intelligent speaker device can include at least one speaker and one or more microphones, such as a far-field microphone or far-field microphone array. Such an intelligent speaker device may include one or more other user interface devices, such as one or more computer displays, but intelligent speaker devices often do not include displays. A digital assistant is a computer component that is configured to process natural language input and to respond with natural language dialog scripts to conduct a conversational natural language dialog, as is discussed more below.